1. Dataset Overview¶2. Investigation Overview¶3. Univariate Exploration¶4. Bivariate Exploration¶5. Multivariate Exploration¶6. References¶Prosper was founded in 2005 as the first peer-to-peer lending marketplace in the United States. Since then, Prosper has facilitated more than $22 billion in loans to more than 1,320,000 people.
Through Prosper, people can invest in each other in a way that is financially and socially rewarding.
Over 110,000 peer-to-peer loans issued on the lending platform Prosper made up the dataset, which includes more than 80 different factors. I decided to concentrate on approximately ten of these factors, so I fiddled with the variables I choose, removing ones that lacked data in the regions I was looking at. 39 outliers who claimed to earn more than $50,000 per month were also eliminated because they were skewing the statistics. For my study, I needed information on roughly 77 thousand loans. I divided the various compliance and delinquency levels into two categories: Compliant and Delinquent.
I intended to look at the characteristics of loans that could be utilized to forecast their borrower APR in this inquiry. The original loan amount, borrower's Prosper rating, loan term, stated monthly income, employment status, and occupation were the major considerations.
This cleaned data set includes details on 76224 loans for 14 different variables. The majority of the variables are numerical, while Loan Status is a nominal category variable.
Original Loan Amount, Borrower Annual Percentage Rate (BorrowerAPR) and BorrowerRate. Predict what factors affect them.
I predict that Monthly paycheck size of the borrower, the original amount of the loan requested, Employment status, and the kind of Occupation will affect the features of interest.
The distribution of BorrowerAPR and BorrowerRate is multimodal in nature
From the histogram chart below, 4k, 10k and 15k are the most borrowed amounts in Prosper loan app
Text(0.5, 1.0, 'Value Count of the Occupation Type')
The Borrower APR is slightly multimodal and the values are between 0.05 to 0.4
The distribution of stated monthly income is skewed to the right and 97% of the borrowers earn below 15k per month
Another observation shows that most of the borrowers are employed.
To achieve the goal of my analysis on the sub-dataset, I dropped all null values since they weren't much enough as to negatively bias the final result.
Will higher loan amount attract lower BorrowerAPR? I predict it should, but don't bank on my assumption, let the data tell us graphically.
We observed a negative correlation between Loan Original Amount and Borrower APR, that means as I earlier predicted higher loan amounts had lower borrower annual percentage return
How does employment status perform across different Prosper Rating
1. Employment Status does not have enough data for Part-time, Retired, Self-employed and Not employed to show its interaction with ProsperRating (Alpha)
2. Most of the employed borrowers were C-rated followed by B and A respectively. Less than 5000 borrowers had AA-rating which is the highest.
Text(0.5, 1.0, 'Prosper Rating (Alpha) across EmploymentStatus')
Does employment status influence the amount of loan requested
Borrowers that have Employed, Self-employed and Others employment status borrow higher amount than part-time, retired,full time and not-employed borrowers.
Text(0.5, 1.0, 'Relationship Between Loan Original Amount & Employment Status')
Does any form of correlation exist between LoanOriginalAmount and BorrowerRate?
There is a negative correlation between the LoanOriginAmount and BorrowerRate.
Obviously, as I had expected that interest rate should be lesser for higher loan amount, the trendline of the scatter plot shows that the negative correlation.
Text(0.5, 1.0, 'Correlation Between BorrowerRate and Loan Original Amount')
How does the BorrowerAPR compare to the loan Term?
36 months term loans have higher BorrowerAPR than 12 or 60 months term.
Text(0.5, 1.0, 'Relationship Between Term and BorrowerAPR')
My key interest here is to investigate how the relationship between LoanOriginalAmount and BorrrowerAPR is impacted by categorical variables like Term and Prosper Rating (Alpha).
As a bonus, I will also explore same impact on Loan Original Amount and BorrowerRate
What is the impact of term on Loan Amount and Borrower APR using regplot. (*Bonus: Replace Borrower APR with BorrowerRate and observe if the trend is same with BorrowerAPR)
Generally, there is a negative correlation between LoanOriginalAmount and BorrowerRate for all 3 terms. Similar tren can be observed between LoanOriginalAmount and BorrowerAPR.
Text(0.5, 1.0, 'Correlation Between BorrowerAPR and Loan Original Amount')
Text(0.5, 1.0, 'Correlation Between BorrowerRate and Loan Original Amount')
How does Prosper Rating affect the relation between Borrower APR (Annual Percentage Rating) and Loan Original Amount
The borrower APR and Loan Original Amount have a positive link. However, the relationship becomes negative as the rating drops from AA to HR. I believe that Prosper executives purposefully increased the APR for high-rated customers as the loan amount requested increased in order to maximise returns from the transaction (possibly because these customers have been with them for a long time and they are already loyal to the brand). In contrast, those with lower prosper ratings have lower APRs as the loan amount increases. I think this is being done on purpose to entice new clients—who most likely have low APRs—to try out the service.
Text(0.5, 1.0, 'Correlation Between BorrowerAPR and Loan Original Amount')
How does Prosper Rating affect the relation between Borrower Rate (Interest Rate) and Loan Original Amount
Similar conclusion can be drawn for this relationship between Loan Original Amount and Borrower Rate as in that of Loan Original Amount vs BorrowerAPR above.
Text(0.5, 1.0, 'Correlation Between BorrowerRate and Loan Original Amount')
Using Seaborn pointplot, can we see how loan term affect the relationship between ProspersRating and BorrowerAPR
Highly rated borrowers (AA-B) have lower APR, though there is an incremental difference as the loan term increases from 12-60. But poorly rated borrowers attract higher APR.
Visualise the impact of Term on the relationship between ProsperRating (Alpha) & LoanOriginalAmount; and ProsperRating (Alpha) & StatedMonthlyIncome
Borrowers with high monthly income and prosper rating tend to borrow loans of 12 months term duration.
Text(0, 0.5, 'StatedMonthlyIncome')
The borrower APR and Loan Original Amount have a positive link. However, the relationship becomes negative as the rating drops from AA to HR.
Further exploration on the influence of loan term and prosper rating on the original loan amount shows that for better rating, the amount increases for all three terms.
Unexpectedly, the borrower APR and loan amount have a negative link when the borrower's Prosper rating is between HR and B, but a positive correlation when the borrower's rating is between A and AA. Another intriguing finding is that for borrowers with HR-C rates, the borrower APR decreases as the borrow time lengthens. However, the APR rises with the length of the loan for those with B-AA credit ratings.